Goto

Collaborating Authors

 logit value




Last Layer Logits to Logic: Empowering LLMs with Logic-Consistent Structured Knowledge Reasoning

Li, Songze, Liu, Zhiqiang, Gong, Zhaoyan, Guo, Xiaoke, Gui, Zhengke, Chen, Huajun, Zhang, Wen

arXiv.org Artificial Intelligence

Large Language Models (LLMs) achieve excellent performance in natural language reasoning tasks through pre-training on vast unstructured text, enabling them to understand the logic in natural language and generate logic-consistent responses. However, the representational differences between unstructured and structured knowledge make LLMs inherently struggle to maintain logic consistency, leading to \textit{Logic Drift} challenges in structured knowledge reasoning tasks such as Knowledge Graph Question Answering (KGQA). Existing methods address this limitation by designing complex workflows embedded in prompts to guide LLM reasoning. Nevertheless, these approaches only provide input-level guidance and fail to fundamentally address the \textit{Logic Drift} in LLM outputs. Additionally, their inflexible reasoning workflows cannot adapt to different tasks and knowledge graphs. To enhance LLMs' logic consistency in structured knowledge reasoning, we specifically target the logits output from the autoregressive generation process. We propose the \textit{Logits-to-Logic} framework, which incorporates logits strengthening and logits filtering as core modules to correct logical defects in LLM outputs. Extensive experiments show that our approach significantly improves LLMs' logic consistency in structured knowledge reasoning and achieves state-of-the-art performance on multiple KGQA benchmarks.


ThoughtProbe: Classifier-Guided LLM Thought Space Exploration via Probing Representations

Wang, Zijian, Xu, Chang

arXiv.org Artificial Intelligence

This paper introduces ThoughtProbe, a novel inference time framework that leverages the hidden reasoning features of Large Language Models (LLMs) to improve their reasoning performance. Unlike previous works that manipulate the hidden representations to steer LLM generation, we harness them as discriminative signals to guide the tree structured response space exploration. In each node expansion, a classifier serves as a scoring and ranking mechanism that efficiently allocates computational resources by prioritizing higher score candidates for continuation. After completing the tree expansion, we collect answers from all branches to form a candidate answer pool. We then propose a branch aggregation method that marginalizes over all supporting branches by aggregating their CoT scores, thereby identifying the optimal answer from the pool. Experimental results show that our framework's comprehensive exploration not only covers valid reasoning chains but also effectively identifies them, achieving significant improvements across multiple arithmetic reasoning benchmarks.


Supplementary Material: Relaxing Local Robustness

Neural Information Processing Systems

That is, Jia et al. provide a probabilistic guarantee that Equation A1 holds. In their evaluation, Jia et al. consider a point, We therefore stipulate that certification must be independent of the true label of the point being certified. While Jia et al. do not address this issue, one straightforward adaptation of their approach is to take Nets, which naturally satisfy affinity robustness on all non-rejected points. By the definition of y, we obtain (C2). Then, by applying (C6) we obtain (C9).



ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning

Wang, Zijian, Xu, Chang

arXiv.org Artificial Intelligence

Pre-trained large language models (LLMs) have been demonstrated to possess intrinsic reasoning capabilities that can emerge naturally when expanding the response space. However, the neural representation mechanisms underlying these intrinsic capabilities and approaches for their optimal utilization remain inadequately understood. In this work, we make the key discovery that a simple linear classifier can effectively detect intrinsic reasoning capabilities in LLMs' activation space, particularly within specific representation types and network layers. Based on this finding, we propose a classifier-guided search framework that strategically explore a tree-structured response space. In each node expansion, the classifier serves as a scoring and ranking mechanism that efficiently allocates computational resources by identifying and prioritizing more thoughtful reasoning directions for continuation. After completing the tree expansion, we collect answers from all branches to form a candidate answer pool. We propose a branch-aggregation selection method that marginalizes over all supporting branches by aggregating their thoughtfulness scores, thereby identifying the optimal answer from the pool. Experimental results show that our framework's comprehensive exploration not only covers valid reasoning chains but also effectively identifies them, achieving significant improvements across multiple arithmetic reasoning benchmarks.


Reducing the False Positive Rate Using Bayesian Inference in Autonomous Driving Perception

Melotti, Gledson, Bastos, Johann J. S., da Silva, Bruno L. S., Zanotelli, Tiago, Premebida, Cristiano

arXiv.org Artificial Intelligence

Object recognition is a crucial step in perception systems for autonomous and intelligent vehicles, as evidenced by the numerous research works in the topic. In this paper, object recognition is explored by using multisensory and multimodality approaches, with the intention of reducing the false positive rate (FPR). The reduction of the FPR becomes increasingly important in perception systems since the misclassification of an object can potentially cause accidents. In particular, this work presents a strategy through Bayesian inference to reduce the FPR considering the likelihood function as a cumulative distribution function from Gaussian kernel density estimations, and the prior probabilities as cumulative functions of normalized histograms. The validation of the proposed methodology is performed on the KITTI dataset using deep networks (DenseNet, NasNet, and EfficientNet), and recent 3D point cloud networks (PointNet, and PintNet++), by considering three object-categories (cars, cyclists, pedestrians) and the RGB and LiDAR sensor modalities.


SNAP: Efficient Extraction of Private Properties with Poisoning

Chaudhari, Harsh, Abascal, John, Oprea, Alina, Jagielski, Matthew, Tramèr, Florian, Ullman, Jonathan

arXiv.org Artificial Intelligence

Property inference attacks allow an adversary to extract global properties of the training dataset from a machine learning model. Such attacks have privacy implications for data owners sharing their datasets to train machine learning models. Several existing approaches for property inference attacks against deep neural networks have been proposed, but they all rely on the attacker training a large number of shadow models, which induces a large computational overhead. In this paper, we consider the setting of property inference attacks in which the attacker can poison a subset of the training dataset and query the trained target model. Motivated by our theoretical analysis of model confidences under poisoning, we design an efficient property inference attack, SNAP, which obtains higher attack success and requires lower amounts of poisoning than the state-of-the-art poisoning-based property inference attack by Mahloujifar et al. For example, on the Census dataset, SNAP achieves 34% higher success rate than Mahloujifar et al. while being 56.5x faster. We also extend our attack to infer whether a certain property was present at all during training and estimate the exact proportion of a property of interest efficiently. We evaluate our attack on several properties of varying proportions from four datasets and demonstrate SNAP's generality and effectiveness. An open-source implementation of SNAP can be found at https://github.com/johnmath/snap-sp23.


Baselines for Identifying Watermarked Large Language Models

Tang, Leonard, Uberti, Gavin, Shlomi, Tom

arXiv.org Artificial Intelligence

Generated Text Detection Via Statistical Discrepancies Recent methods such as DetectGPT and GPTZero distinguish We consider the emerging problem of identifying between machine-generated and human-written text the presence and use of watermarking schemes by analyzing their statistical discrepancies (Tian, 2023; in widely used, publicly hosted, closed source Mitchell et al., 2023). DetectGPT compares the log probability large language models (LLMs). We introduce a computed by a model on unperturbed text and perturbed suite of baseline algorithms for identifying watermarks variations, leveraging the observation that text sampled from in LLMs that rely on analyzing distributions a LLM generally occupy negative curvature regions of the of output tokens and logits generated by model's log probability function. GPTZero instead uses watermarked and unmarked LLMs. Notably, watermarked perplexity and burstiness to distinguish human from machine LLMs tend to produce distributions text, with lower perplexity and burstiness indicating that diverge qualitatively and identifiably from a greater likelihood of machine-generated text.